{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# 使用 Langchain 和 GLM 完成简单的简历信息抽取任务。\n",
    "\n",
    "**This tutorial is Only in Chinese explanation**\n",
    "\n",
    "本代码，我将使用 Langchain 配合 GLM-4 完成 Word 文档的简历信息抽取任务。\n"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "f9c7735c37bd4a71"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 1. 配置相关环境\n",
    "由于需要安装部分依赖，这里我们需要安装必要的依赖，如果你已经安装了这些依赖，你可以跳过这个步骤。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "71f6db85940dd9ad"
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: langchain in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (0.1.4)\r\n",
      "Requirement already satisfied: unstructured in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (0.12.3)\r\n",
      "Requirement already satisfied: python-docx in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (1.1.0)\r\n",
      "Requirement already satisfied: PyYAML>=5.3 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (6.0.1)\r\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (2.0.25)\r\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (3.9.3)\r\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (4.0.3)\r\n",
      "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (0.6.3)\r\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (1.33)\r\n",
      "Requirement already satisfied: langchain-community<0.1,>=0.0.14 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (0.0.16)\r\n",
      "Requirement already satisfied: langchain-core<0.2,>=0.1.16 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (0.1.17)\r\n",
      "Requirement already satisfied: langsmith<0.1,>=0.0.83 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (0.0.84)\r\n",
      "Requirement already satisfied: numpy<2,>=1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (1.26.3)\r\n",
      "Requirement already satisfied: pydantic<3,>=1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (2.5.3)\r\n",
      "Requirement already satisfied: requests<3,>=2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (2.31.0)\r\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain) (8.2.3)\r\n",
      "Requirement already satisfied: chardet in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (3.0.4)\r\n",
      "Requirement already satisfied: filetype in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (1.2.0)\r\n",
      "Requirement already satisfied: python-magic in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (0.4.27)\r\n",
      "Requirement already satisfied: lxml in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (5.1.0)\r\n",
      "Requirement already satisfied: nltk in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (3.8.1)\r\n",
      "Requirement already satisfied: tabulate in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (0.9.0)\r\n",
      "Requirement already satisfied: beautifulsoup4 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (4.8.2)\r\n",
      "Requirement already satisfied: emoji in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (2.10.1)\r\n",
      "Requirement already satisfied: python-iso639 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (2024.1.2)\r\n",
      "Requirement already satisfied: langdetect in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (1.0.9)\r\n",
      "Requirement already satisfied: rapidfuzz in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (3.6.1)\r\n",
      "Requirement already satisfied: backoff in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (2.2.1)\r\n",
      "Requirement already satisfied: typing-extensions in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (4.9.0)\r\n",
      "Requirement already satisfied: unstructured-client>=0.15.1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (0.17.0)\r\n",
      "Requirement already satisfied: wrapt in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured) (1.16.0)\r\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\r\n",
      "Requirement already satisfied: attrs>=17.3.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.2.0)\r\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\r\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\r\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\r\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (3.20.2)\r\n",
      "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (0.9.0)\r\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from jsonpatch<2.0,>=1.33->langchain) (2.4)\r\n",
      "Requirement already satisfied: anyio<5,>=3 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1.16->langchain) (4.2.0)\r\n",
      "Requirement already satisfied: packaging<24.0,>=23.2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1.16->langchain) (23.2)\r\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from pydantic<3,>=1->langchain) (0.6.0)\r\n",
      "Requirement already satisfied: pydantic-core==2.14.6 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from pydantic<3,>=1->langchain) (2.14.6)\r\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from requests<3,>=2->langchain) (3.3.2)\r\n",
      "Requirement already satisfied: idna<4,>=2.5 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from requests<3,>=2->langchain) (3.6)\r\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from requests<3,>=2->langchain) (2.2.0)\r\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from requests<3,>=2->langchain) (2023.11.17)\r\n",
      "Requirement already satisfied: dataclasses-json-speakeasy>=0.5.11 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured-client>=0.15.1->unstructured) (0.5.11)\r\n",
      "Requirement already satisfied: jsonpath-python>=1.0.6 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured-client>=0.15.1->unstructured) (1.0.6)\r\n",
      "Requirement already satisfied: mypy-extensions>=1.0.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured-client>=0.15.1->unstructured) (1.0.0)\r\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured-client>=0.15.1->unstructured) (2.8.2)\r\n",
      "Requirement already satisfied: six>=1.16.0 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from unstructured-client>=0.15.1->unstructured) (1.16.0)\r\n",
      "Requirement already satisfied: soupsieve>=1.2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from beautifulsoup4->unstructured) (2.5)\r\n",
      "Requirement already satisfied: click in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from nltk->unstructured) (8.1.7)\r\n",
      "Requirement already satisfied: joblib in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from nltk->unstructured) (1.3.2)\r\n",
      "Requirement already satisfied: regex>=2021.8.3 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from nltk->unstructured) (2023.12.25)\r\n",
      "Requirement already satisfied: tqdm in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from nltk->unstructured) (4.66.1)\r\n",
      "Requirement already satisfied: sniffio>=1.1 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1.16->langchain) (1.3.0)\r\n",
      "Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/zr/Code/glm-cookbook/venv/lib/python3.9/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1.16->langchain) (1.2.0)\r\n",
      "\u001B[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001B[0m\u001B[33m\r\n",
      "\u001B[0m"
     ]
    }
   ],
   "source": [
    "!pip install langchain unstructured python-docx  "
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-02-02T13:52:57.738172Z",
     "start_time": "2024-02-02T13:52:56.064699Z"
    }
   },
   "id": "4fa261f2679ba7e3"
  },
  {
   "cell_type": "markdown",
   "source": [
    "接着，我们需要将我们的 API_KEY 配置到环境变量中，用于调用 GLM-4 模型。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "741ecbd0ae7b3148"
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"ZHIPUAI_API_KEY\"] = \"your api key\""
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-02-02T13:52:57.742761Z",
     "start_time": "2024-02-02T13:52:57.740017Z"
    }
   },
   "id": "5f982f839d2340d3"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 2. 读取简历文档\n",
    "\n",
    "我们需要将简历的文档用 Langchain 的 UnstructuredWordDocumentLoader 读入，并填充到我们的模板中。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "cd69981c5a6fb246"
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import UnstructuredWordDocumentLoader\n",
    "\n",
    "loader = UnstructuredWordDocumentLoader(\"data/resume.docx\")\n",
    "data = loader.load()"
   ],
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2024-02-02T13:52:57.777970Z",
     "start_time": "2024-02-02T13:52:57.743554Z"
    }
   },
   "id": "initial_id"
  },
  {
   "cell_type": "markdown",
   "source": [
    "设定好模板，这个模板将会作为我们的系统提示词。我们将使用 Langchain 的 ChatPromptTemplate 来构建这个模板。其中，{resume} 将会被我们的简历内容填充。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "90f6dca3b6fec516"
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [],
   "source": [
    "from langchain_core.prompts import SystemMessagePromptTemplate\n",
    "from langchain.schema import HumanMessage, AIMessage\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "\n",
    "system_prompt = \"\"\"\n",
    "你是 ZhipuAI 的 人事资源管理部门的优秀员工，现在我需要你帮我阅读简历并筛选出合适的人才，请你基于我提供的简历，对简历进行细节的分析，抓取相关的资料并回答我提出的问题。\n",
    "现在，我将会将简历以文字的形式给你提供，具体内容如下:\n",
    "\n",
    "<resume>\n",
    "{resume}\n",
    "</resume>\n",
    "\n",
    "请你根据我的简历，开始回答我的问题吧。请注意我的提问的内容和我需要你回答的格式，我们开始吧：\n",
    "\"\"\"\n",
    "\n",
    "question_prompt = [\n",
    "    \"候选人读过哪些大学？\",\n",
    "    # \"请帮我提取候选人简历中的关键信息，用JSON格式返回给我，我需要的字段是：姓名、性别、年龄、学历、工作年限、工作经历、项目经历、技能、个人优势、个人缺点、兴趣爱好；简历中没有提到的字段也要输出，但字段值为空。json的key可以使用中文，value的长度不要超过100个字符，如果字段值太长，请对内容进行总结摘要再输出。例如工作经历可以只保留公司名称和职位，工作经历和项目经历可以只保留项目名称和项目描述\",\n",
    "    # \"你怎么评价这个候选人，从他现有的资历、技术能力、工作态度、发展潜力进行分析。我们公司目前想招聘一个3-5年工作经验有一定的发展潜力的员工，请结合对候选人的分析和我的招聘需求判断我是否应该给他面试机会？\"\n",
    "]\n",
    "\n",
    "chat_template = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        SystemMessagePromptTemplate.from_template(system_prompt),\n",
    "    ]\n",
    ")\n",
    "messages = chat_template.format_messages(resume=data)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-02-02T13:52:57.783016Z",
     "start_time": "2024-02-02T13:52:57.781318Z"
    }
   },
   "id": "c500b35994e14ff2"
  },
  {
   "cell_type": "markdown",
   "source": [
    "接着，我们就可以调用 GLM-4 模型，通过模型对简历进行抓取和提取关键信息，获得有效的内容和答案。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "e01263d1a189281f"
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "候选人李明浩读过的大学有：\n",
      "\n",
      "1. 清华大学，专业是计算机科学与技术。\n",
      "2. 北京邮电大学，专业是信息技术。\n"
     ]
    }
   ],
   "source": [
    "from langchain_community.chat_models import ChatZhipuAI\n",
    "\n",
    "for question in question_prompt:\n",
    "    messages = chat_template.format_messages(resume=data)\n",
    "    messages.append(\n",
    "        HumanMessage(\n",
    "            content=question\n",
    "        )\n",
    "    )\n",
    "    llm = ChatZhipuAI(\n",
    "        temperature=0.01,\n",
    "        model=\"glm-4\",\n",
    "        max_tokens=8192,\n",
    "        stream=False,\n",
    "    )\n",
    "    messages.append(\n",
    "        AIMessage(\n",
    "            content=llm(messages).content\n",
    "        )\n",
    "    )\n",
    "    print(messages[-1].content)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-02-02T13:53:00.080689Z",
     "start_time": "2024-02-02T13:52:57.785262Z"
    }
   },
   "id": "1f6fc2837b9d98b5"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 3. 结果分析\n",
    "\n",
    "通过大模型，我们可以顺利的抽取出简历中的关键信息，包括教育背景、工作经历等。这样，我们就可以通过简单的代码，完成简历信息的抽取任务。\n",
    "这是一个开放性的demo，意味着你可以自己选择其他任务来接着完成这个场景的研究。"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "a78cbd80584dfa28"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
